An R Package to Read CCCCO MIS Files
Christian Million
Data Analyst
Yosemite Community College District
Main Goal: Showcase benefits of developing internal tools with R.
comiscomis?Every term, someone at your college converts SIS data into .DAT files, using the file specs found in the Data Element Dictionary.
These are submission files.
After submission, colleges request referential files from the CCCCO.
These contain elements derived from submission files, explicit formatting, and additional student information.
25 files | 396 elements
Fixed Width Format
No Column Names
Numbers that should be characters / dates
Missing values (NA)
Trailing white space
Implied decimal points
27 files | 406 elements
Tab Delimited
No Column Names
Numbers that should be characters / dates
Missing values (NA)
Trailing white space
Implied decimal points
Different date format than submission file.
A lot to re-remember
Cognitively taxing to implement
Takes time
Updates to multiple scripts
Copy / paste errors
Makes scripts more difficult to read
Unfulfilling
Lots of overhead before analysis can begin
comislibrary(dplyr)
library(readr)
CB_col_names <- c('GI90', 'GI01','GI03', paste0("CB0",0:9), paste0("CB",10:27), "Filler")
CB_col_types <- rep("c", length(CB_col_names))
CB_col_width <- CB <- c(2,3,3,12,12,68,6,1,1,length(109:112),length(113:116),1,1,1,1,1,1,6,8,length(137:148),length(149:160),length(161:172),7,9,1,1,1,1,1,1,1,26)
XB_col_names <- c('GI90', 'GI01', 'GI03', 'GI02', 'CB01', paste0('XB0',0:9), 'XB10', 'XB11', 'XB12', 'CB00', 'Filler')
XB_col_types <- rep("c", length(XB_col_names))
XB_col_width <- c(2,3,3,3,12,6,1,6,6,1,length(44:47), length(48:51),1,1,1,1,length(56:61), 1, 12,7)
CB_src <- readr::read_tsv("path/to/U59223CB.dat",
col_names = CB_col_names,
col_types = CB_col_types,
trim_ws = TRUE)
XB_src <- readr::read_tsv("path/to/U59223XB.dat",
col_names = CB_col_names, # copy / paste errors
col_types = XB_col_types,
trim_ws = TRUE)
CB <- CB_src |>
mutate(dates = date_cleaning_code(),
units = implicit_decimal_code())
XB <- XB_src |>
mutate(dates = date_cleaning_code(),
units = implicit_decimal_code())comisEasier to tell what’s happening
Reduces cognitive overhead
Documentation contained within the package
Updates made in one spot (instead of throughout various scripts)
Shifts focus to what’s important - Using the Data
Contains useful data found on CCCCO websites
Read many files at once
Read from repo
Use DED Name or Descriptive Name
Addresses problems specific to the institution
Reasonable defaults
Abstracts common tasks
Maintainable
Share code with others
DisImpact (“internal” to CCCCO)
yccdDB (creates and manages DB connections / queries)
hub (.Rmd/.Qmd storage and usage monitoring)
yccdTemplates (project / analysis / report templates)
yccdThemes (branding graphs / reports)
yccdTerms (help with term math / formatting)
Christian Million
Data Analyst
Yosemite Community College District
Pier to Pier | 2022-08-25